On the basic computational structure of gene regulatory networks 
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Gene regulatory networks constitute the first layer of the cellular computation for cell adaptation 
and surveillance. In these webs, a set of causal relations is built up from thousands of interactions 
between transcription factors and their target genes. The large size of these webs and their entangled 
nature make difficult to achieve a global view of their internal organisation. Here, this problem has 
been addressed through a comparative study for Escherichia coli, Bacillus subtilis and Saccharomyces 
cerevisiae gene regulatory networks. We extract the minimal core of causal relations, uncovering 
the hierarchical and modular organisation from a novel dynamical/causal perspective. Our results 
reveal a marked top-down hierarchy containing several small dynamical modules for E. coli and 
B. subtilis. Conversely, the yeast network displays a single but large dynamical module in the 
middle of a bow-tie structure. We found that these dynamical modules capture the relevant wiring 
among both common and organism-specific biological functions such as transcription initiation, 
metabolic control, signal transduction, response to stress, sporulation and cell cycle. Functional 
and topological results suggest that two fundamentally different forms of logic organisation may 
have evolved in bacteria and yeast. 
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I. INTRODUCTION 

The pattern of regulatory interactions linking tran- 
scription factors (TFs) to their target genes constitutes 
the first level of a multilayered network of gene regula- 
tion; the so called gene transcriptional regulatory net- 
works (GRN) [l[ . Some of these patterns have been re- 
covered from genome- wide approaches, particularly well 
established for Escherichia coli [2], y, |J] and Bacillus Sub- 
tilis 0, Q as well as the unicellular eukaryote Saccha- 
romyces cerevisiae 0, [8| . In such a p icture, both hierar- 
chical dEQlEI and modular @, [S El E3 components 
have been repeatedly highlighted, although no general 
agreement exists on what scale of analysis more accu- 
rately captures global complexity [l5j . 

The conceptualisation of cellular interaction maps 
within the framework of graph theory p| [ill Il7l has pro- 
vided powerful insights on their hierarchical jl8l. QjJ [H 
[2lj . modular organisation [22|, [H, HH, [H| as well as it 
sheds light on the topological constraints imposed in its 
dynamical behaviour @, Ull- However, their quantifica- 
tion, even their identification has led to a nonuniform 
concept of module under functional, topological, evolu- 
tionary and developmental criteria [27j,[28|. Similarly, the 
observed hierarchy seldom matches an ideal feed-forward 
relation between components [2^ |. 

An alternative approach considers looking at GRNs in 
a more fundamental way, namely as sets of causal rela- 
tions, featured by the directed nature of the link between 
a transcription factor with its target genes [J, [3(| ■ Causal 
links, namely who acts on whom, allows to actually de- 
fine the skeleton underlying dynamics by attending to 
the cyclic and linear nature of the genetic circuits. 

Figure Q] illustrates the resulting dynamics of a toy 
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FIG. 1: A small toy model (a) and its dynamic behaviour. 
The observed pattern is generated by the activity of two ba- 
sic feedback loops, indicated as A and B in (a) . These subsets 
are responsible for the qualitative dynamics exhibited by the 
net, as shown in (b) with filled circles. Moreover, this attrac- 
tor results from the combination of the two different periodic 
orbits displayed by the two basic modules, whose time series 
are shown in (c-d). For a mathematical treatment of this 
specific example see appendix. 



model (see mathematical appendix for formal details). 
In this model we observe that the complex, periodic be- 
haviour may arise solely due to the dynamics of the two 
subgraphs (A and B in figure |T^i) exhibiting feedback 
loops. The remaining system simply reacts to the dy- 
namical inputs generated from these modules. Interest- 
ingly, these mentioned topological traits defines the dy- 
namical behaviour no mattering what kind of dynamical 
rules are used to causally relate our elements. Some el- 
ements will play the leading role, determining the quali- 
tative type of dynamics, whereas others will just amplify 
or reduce the core signals. This view of interactions is 
at the core of a relational picture of a biological com- 
putation [3lj . This is a computational perspective and, 
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not surprisi ngly GRNs have been compared to comput- 
ers [HI, HI, [H, [H, [H[ . Cellular computations pervade 
both the diverse responses to external stim uli |36l H3] as 
well as cell robustness and plasticity [38, 39ll40j|. A num- 
ber of dynamical approximations suggest a link between 
network organization and its dynamical behaviour. How- 
ever, due to the large size of these systems, only a few 
small ones [4lj, |42J have been fully analysed. 

Ideally, it would be desirable to have a method to con- 
struct a graph capturing all non-trivial causal relations 
and thus all potentially important computational links. 
In this paper we show that such organisation of GRNs is 
captured by the principle of causality depicted by the di- 
rected nature of gene-gene relations. This can uncovered 
by exploiting the properties of directed graphs, in partic- 
ular, by the use of leaf removal algorithms (LRAs) [oll3"oj 
and the identification of the so-called strongly connected 
components (SCCs) As we will see, LRAs recover 

the network fraction responsible for potentially non triv- 
ial computations. Furthermore, the internal organisation 
of this especial subgraph can be topologically dissected 
by SCC identification. 

A SCC is defined for a set of vertices if there is a walk 
-attending to the directedness of edges- from each vertex 
to every other vertex of this set (see figure III Al) . The 
identification of SCC has been applied to a diverse num- 
ber of systems from the Internet [44| to metabolism and 
neural wiring network [45J. Recently, it has been de- 
scribed that yeast GRN presents a giant SCC [iq] . con- 
trasting with an acyclic and feed-forward organisation of 
E. coli H3- 

In this work, we explore the topological constraints 
derived from wiring patterns of causal dependencies in 
a comparative analysis of yeast, E. coli and B. subtilis 
by defining their qualitatively relevant causal cores. We 
explore their internal organisation in terms of irreducible 
computational entities at the level of gene regulatory sys- 
tems. As will be shown below, our analysis (using a more 
updated version of E. coli GRN) revealed relevant dif- 
ferences in relation to previous results published in the 
literature. 



II. RESULTS 

A. Causality, dynamics and topology 

Causal relations in GRNs can be described in terms of 
directed graphs (in short digraphs) [U, [lj|. A graph is 
constituted by a set of vertices or nodes (here the genes) 
and the set of edges linking them (the relations among 
genes) . The regulatory effect of a TF gene Vi on a specific 
target gene Vj is depicted by an arrow (vi — > Vj) in the 
graph plot. 

If the vertices are genes, a chain of connections of dif- 
ferent TFs corresponds in graph theory with a directed 
walk\6^ and it can be interpreted as a causal chain. Inter- 
estingly, all TFs exhibit outgoing links, whereas non-TF 




FIG. 2: Bow-tie structure depicting an incoming component 
(IC), two strongly connected components (SCCs) and the 
outgoing component (OC). 



genes (the target ones) only receive arrows from the TF 
set. The number of outgoing links of a vertex is known 
as out-degree (denoted by k out ) whereas the number of 
incoming edges is the in-degree (/cj„). Since a TF can be 
a regulatory target of other TFs, they can exhibit both 
incoming and outgoing links, allowing feed-backs to oc- 
cur. 

A specially important information about the organi- 
zation of directed graphs is provided by the analysis of 
its connected components, i.e. the graph in which every 
pair of distinct vertices has a walk -see appendix. In di- 
graphs, directed walks are non trivially organised. The 
Internet is a very illustrative example of that [44j exhibit- 
ing a bow-tie structure with three different connected 
components: 1) a central strongly connected component 
(SCC), namely a subgraph for which every two vertices 
are mutually reachable [43|, |48| , 2) a set of incoming com- 
ponents (IC) composed by the set of feed-forward path- 
ways starting from vertices without in-degree and ending 
in SCC and 3) a set of outgoing components (OC) where 
pathways starting from SCC end in vertices without out- 
degree (see figure III A|) . Typically, the pattern of activa- 
tion of a given gene has a time-dependent causal relation 
with the state of the set of genes affecting it, and every 
vertex m can acquire a given state from a number of pos- 
sible states. Specifically, the state of Vi at time (t + 1) 
is influenced by the state of the vertices affecting it at 
time t -see appendix. The equations describing the dy- 
namical behaviour of a given vertex Vi at time t + 1 can 
be formulated in different ways, including Boolean dy- 
namics [5(| , threshold nets [4lf as figure U illustrates 
or coupled differential equations 51!]. These models are 
different but all show a common principle of causality: 
the state of a given vertex Vi at time t + 1 is exclusively 
defined by the vertices affecting it at time t. No matter 
our choice of the dynamical equations, the patterning of 
links must strongly influence system's behaviour. 

The presence of a cyclic walk -a directed walk where 
the first and the ending vertices are the same- is a neces- 
sary (but not sufficient) requisite for a periodic solution. 
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This is due to the fact that, in interpreting directed edges 
as causal relations, a cyclic graph implies that every ver- 
tex is indirectly affected by itself. When we consider 
the whole graph, an overlapping of cycles originates the 
SCC set that can be governed by the IC set. As we 
shall show below, these two structures represent the core 
of the graph that qualitatively constrains its dynamical 
complexity. By contrast, in linear directed walks, the up- 
streamest element fully determines the final state of the 
elements belonging to this walk. 

A set able to properly capture the relevant compo- 
nents qualitatively affecting global behaviour should re- 
move linear paths from the graph. Under this view, we 
can identify this set by means of a straightforward itera- 
tive algorithm. Given the underlying influence of causal 
relations on system's dynamics, we will use the label dy- 
namic for most of our definitions. 



B. Dynamical backbone 

We compute the dynamical backbone (DB) of a given 
directed graph Q (to be noted DB(Q)) by the iterative 
pruning of vertices with k ou t — from the initial graph, 
but maintaining those vertices whose ki n — (figure [3]). 
This algorithm belongs to the family the so-called leaf 
removal algorithms |9. l52l lo3| that have been previously 
used for the analysis of GRNs @, ©, H| . 

Figure^ illustrates the mechanism of pruning. Notice 
that, in contrast with previous proposals of leaf-removal 
algorithms, the DB(Q) also keeps the single root vertices 
i.e. those such that k in = which appear isolated. Single 
root vertices are special because their state is only exter- 
nally changed and are not influenced by other genes. We 
observe that, in general the DB(Q) displays more than 
a single connected component (see figure Interest- 
ingly, this dynamically relevant subgraph coincides with 
the union of the previously described IC and SCC sets 
of a directed graph. By contrast, OC is associated with 
another interesting subgraph, to be indicated as DB'(Q) 
formed by the fraction of the net that exclusively displays 
feed-forward structures. See appendix for its formal con- 
struction. 

The application of the above defined algorithm leads to 
a drastic reduction of network complexity, preserving the 
computational structures responsible of the qualitative 
behaviour of the net. 



C. Dynamical Modules and Hierarchy 

As we argued above, the core of causal relations of a 
GRN is captured by its DB. Furthermore, the causal re- 
lations inside this DB can display some kind of hierarchy. 
As we shall see DB organisation can be studied by SCC 
detention and the so-called graph condensation process. 
Mathematical appendix formally details the treatment of 




FIG. 3: Dynamical Backbone definition, (a) an example 
of the iterative elimination of vertices without out-degree 
(kout = 0). Different layers indicate the successive iterations 
of the algorithm. In (b) we indicate the dynamical modules 
of the DB(Q). Maximal cycles larger than one vertex are 
labelled by capital letters. Figure (c) shows the hierarchi- 
cal organization of the collapsed dynamical modules, i.e., the 
feed-forward organization obtained after the condensation of 
the graph DB(Q) -see text. Notice that the DB(Q) exhibits 
isolated self-interacting vertices that, by definition, are dy- 
namical modules but auto-loops are maintained in order to 
provide a more informative picture. 



GRNs that uncovers the order relation (i.e. the hierar- 
chical order) among the elements of the net. 

Figure [3Jd illustrates the process for module identifica- 
tion by SCC detection. Interestingly, the nodes belong- 
ing to a SCC are mutually affected and therefore, they 
all contribute to the state definition of all the set (see A 
and B regions of the toy model described in figure Uk). 
The complexity of these topological entities cannot be re- 
duced and thus, SCC concept defines an irreducible unit 
of causal relations (see mathematical appendix for for- 
mal definition). In this sense, the SCC can be identified 
with the module concept, since it constitutes an identifi- 
able semiautonomous part of the system [27j , capturing a 
part of the dynamical complexity of the system. There- 
fore, we say that, within the context of GRNs, SCCs 
represent a sort of dynamical modules. We finally note 
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that, consistently with definition, a single vertex belong- 
ing to the DB but not participating in any cycle is itself 
a dynamical module. 

Interestingly, when the SCC is represented by a single 
vertex, the resulting graph is a directed acyclic graph 
(see figure [3fc) . Such an operation is commonly referred 
in the literature as the condensation of a graph [43| . 

The result after the condensation process is precisely 
the feed-forward organisation what enables us to define 
an order relation among the elements of the DB. It is 
worth stressing that an order relation cannot be estab- 
lished in cyclic wiring. Condensation process allows us 
to identify such cycles into SCCs representing dynami- 
cal modules. In this way, the internal organisation in a 
hierarchy of causal relations is revelaled for the DB. 



D. E. coli dynamical backbone 

We extracted the GRN for E. coli K-12 prokary- 
ote from the available information in RegulonDB 6.0 
database [54[ . The resulting network was a directed 
graph with 1607 vertices (43 of them with k in — 0) and 
4141 links with a giant component of 1589 vertices and 
average degree (k) — 5.1 [See Methods for network con- 
struction]. The network included a total of 156 vertices 
with k out > 0, corresponding with transcription and a 
trans-acting factors, and 1451 vertices with k out = 0, i.e, 
the target genes. 

From the E. coli GRN, the DB subgraph was obtained 
after the second iteration of the algorithm. The DB sub- 
graph consists of 142 vertices (8.9% of the GRN) dis- 
tributed as follows: 33 single root vertices and a set of 
subgraphs with 109 vertices (10 of them with ki n = 0) 
and 279 edges. This set is organised in 9 graphs: a 
giant component of 100 vertices ((fc) = 3.7), another 
component displaying two elements and 7 isolated self- 
interacting vertices. We found that the size of the studied 
DB is markedly smaller than the expected in a null model 
obtained by randomisation process. In this methodology, 
node degree is conserved but correlations among nodes 
are destroyed (see methods section). Differences in the 
DB size between E. coli and randomised counterpart fea- 
ture the biological fact of that only a small fraction of 
genes in the genome operates over a majority of genes 
without a direct role in the regulation of transcription. 
As expected and in agreement with the direct observation 
of RegulonDB, the genes belonging to DB are described 
as either transcription or a factors. Only two excep- 
tions were found: the transcription anti-terminator cspE 
and trmA, a tRNA metyltransferase (according to Regu- 
lonDB 6.0 [see SI for a biological function of DB genes]). 
Functional analysis based on gene ontology annotation 
confirmed the significant (P < 10E — 5) overabundance 
of general functions related to regulation of transcription, 
signal transduction and transcription initiation (see table 
U and SI2 for a detailed statistical analysis) . 

Concerning the previous observations, we obtained a 



graph resulting from the interactions among TFs (topo- 
logically, the nodes with out-degree) in GRN. According 
its definition, DB is also a subset of TF subnet. Fig- 
ure Oi inset shows that DB size for randomised TF-TF 
subgraph is fairly smaller than the E. coli DB. We also 
found that the number of root nodes expected by chance 
in both GRN and TF subgraph follow a similar behaviour 
than the observed for DB size. These results show that 
DB structure for E.coli exhibits a markedly different or- 
ganisation than the expected in a null model (see figure 

lib). 

Figure |4Jl shows the DB organisation after the con- 
densation process. The analysis of the condensed DB 
revealed five dynamical modules larger than one vertex 
(figure 0Jd). Interestingly, the size of SCCs are markedly 
smaller in both GRN and TF randomised sets. Notice 
that figure [5] depicts large values for SCC sizes equal to 
one in randomised networks, however, they correspond 
with the fraction of the non condensed network. Typi- 
cally, randomised networks exhibited a single large SCC 
component (within the range of 800-900 nodes), occa- 
sionally together with a very small SCC, rarely larger 
than three nodes (data from SCC distributions within 
networks are not shown). It is worth to note that no 
randomised web, neither for GRN nor TF-TF subgraph 
sets produced exclusively small SCCs as observed in real 
data. This indicates that GRN is not devoid of cycles 
but they are distributed along the DB and confined in 
dynamical modules of small size. 

We found that the condensed DB captures the hier- 
archical behaviour of the largest graph component evi- 
denced by a feed-forward order relation with six layers of 
downstream dependencies (see figure 2]) . By definition, 
the first layer contains all the vertices with ki n = but 
we can see that it also includes the largest dynamical 
module. 

The largest module (A in figure [4]) contains four of the 
seven a factors described for E. coli (see table Q] and SI2 
for statistical details). These elements are responsible for 
transcription initiation. Together with the primary ini- 
tiator factor rpoD (a 70 ), we found the alternative ones 
operating under heat shock stress (rpoH and rpoE, cor- 
responding with a 32 and <r 24 , respectively). In addition, 
rpoN (<t 54 , initiator of nitrogen metabolism genes) is also 
part of the module. The second largest hub (a highly 
connected vertex) in the DB is the crp gene, also known 
as CAP (catabolite activator protein). CAP is a gen- 
eral regulator that exerts a positive control of many of 
the catabolite sensitive operons as a sensor of glucose 
starvation. Interestingly, the members of module A par- 
ticipates in the 44% of the total number of links of the 
GRN, indicating the relevance of this module in defining 
the state of the whole network. 

Other relevant factors are co-localised in this dynami- 
cal module such as those TFs related to nutrient sensor 
and assimilation (phosphate sensor system, phoB, as well 
as nucleoside (cytR) and arginine (argP) transport con- 
trol. Notice that the initiator factor of DNA replication 
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TABLE I: A selection of statistically over-represented biological functions terms from gene ontology annotation (GOA) is shown 
for DB, the root node set and dynamical modules (A to E), accordingly functional analysis for E. coli. Notice that three genes 
of the DB were not found in GOA annotation (see SI2 for a detailed analysis). The number of genes belonging to a specific 
GOA term is depicted in the first column. The fraction of genes associated to specific GOA respecting to the total number of 
genes for each particular set are shown for P > 0.05 (P is the level of significance). Highly significant results (P < 10E — 5) 
are indicated with (**). Values within the interval of 10-E — 5 > P > WE — 3 are marked by (*). 



Biological Function 


# genes in GO 


Fraction of the GO term in 


the analysed set 




E. 


coli Root DM A DM B 


DM C DM D DM E 




DB 


nodes 





Regulation of 
transcription, 
DNA-dependent 


354 


135/136*37/37* 


10/11* 4/4** 


3/3 


2/2 


2/2 


Regulation of metabolism 


399 


136/136 37/37** 


11/11*4/4** 


3/3 


2/2 


2/2 


Two-component signal 
transduction system 
(phosphorelay) 


91 


26/136* 6/37** 










Transcription initiation 


7 


7/136* 


4/11* 1/4 








Negative regulation of 
cellular process 


36 


10/136* 4/37** 


2/11 








DNA replication 


62 




3/11** - 








Response to heat 


4 




1/11 








Response to antibiotic 


60 






2/3 






Carbohydrate transport 


112 








2/2 





initiator (dnaA) is associated with this group, and we 
also found two specific TF expressed under stress con- 
ditions (lexA and cpxR). Similarly, the other four mod- 
ules include key genes associated to adaptive responses to 
changing environmental clues. These include homeosta- 
sis in acid environment (stress responses to high osmolar- 
ity, module B), antibiotic resistance (lead by mar A and 
marR, module C), glucitol use (module D) or responses 
to oxygen changes (module E). 

In summary, the E. coli DB describes a hierarchical 
feed-forward network with a strong fan-like pattern domi- 
nated by a single irreducible subset of TFs. This pattern 
reveals, in terms of a computational design, a strongly 
centralised organisation, with a well defined node affect- 
ing multiple layers of activity. The largest hub has no in- 
put from other members of the DB and thus only affect 
to other's behaviour. Within the DB, we identify the 
dynamical modules responsible for the control of tran- 
scriptional replication under both normal and stress con- 
ditions, control of metabolism, DNA replication as well 
as assimilation of essential sources of nitrogen and phos- 
phorous. 



E. B. subtilis dynamical backbone 

B. subtilis GNR was obtained from DBTBS [H [see 
Methods for network construction]. The resultant net- 
work consists of 922 nodes (159 with k ou t > 0, 83 of 
them with = and 763 with k out — 0) and 1366 di- 
rected links ((k) = 2.96). A component of 892 nodes was 
predominant and the remaining nodes made part of very 
little isolated subgraphs (one of five nodes, one of three, 
10 of two nodes and two isolated self-interacting nodes). 

The computation of DB required four iterations be- 
fore the acquisition of the stable graph. The process re- 
moved 778 nodes, resulting a DB of 144 genes (10.5 % of 
the GRN) . In this set, 73 were isolated root nodes and 
71 nodes were distributed in 8 connected components (a 
big one of 59 nodes, one of five, one of two and 5 self- 
interacting isolated nodes) with 108 links. Analogously 
to the observed for E. coli, statistical analysis of GOA 
terms showed and over-abundance of regulation of tran- 
scription metabolism and cell signalling (see table HT1 and 
SI2 for statistical details) in both DB and root nodes set 
(see table [TT] and SI2 for statistical details) . Other func- 
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FIG. 4: Dynamical hierarchy and modularity of E. coli GRN. Dynamical Backbone of E. coli after collapsing dynamical 
modules revealing a causal hierarchy (above) . Different levels of downstream information processing are indicated by means of 
different vertex labelling. Dynamical modules larger than one vertex are represented by an individual node labelled by capital 
letters. Auto-links in single-node module are preserved in the graph for a more informative picture. Below E. colt DB, the five 
(A to E) dynamical modules larger than one node are shown. 




DB size number of root nodes SCC size 

FIG. 5: Randomisation for E. coli GRN and its respective TF-TF subgraph (figure insets). Normalised counts represent the 
number of times that a value of the measured estimator {DB size, number of root nodes and SCC size) appears, normalised 
by the number of repetitions (1000 randomised networks) 



tions relating to transcription termination and positive 
regulation process appear in B. subtilis DB. 

We note that comparison of functional analysis among 
species must be taken with caution due to the different 
coverage of genes in GOA terms that may produce a bias 



in the interpretation of results. In order to provide a more 
complete picture, we consider functional information an- 
notated in DBTBS. However, we cannot exclude some 
discrepancies derived from the use of different sources. 
Similarly to E. coli, statistical functional analysis is com- 
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TABLE II: A selection of statistically over-represented biological functions terms from gene ontology annotation (GOA) is 
shown for DB, the root node set and dynamical modules (A to E), accordingly functional analysis for B. subtilis. The number 
of genes belonging to a specific GOA term is depicted in the first column. Notice that 8 genes of 144 belonging to DB were 
not found in GOA and therefore not considered in this analysis. The fraction of genes associated to specific GOA respecting 
to the total number of genes for each particular set are shown for P > 0.05 (P is the level of significance). Highly significant 
results (P < WE — 5) are indicated with (**). Values within the interval of WE — 5 > P > WE — 3 are marked by (*). 



Biological Function # genes in GO Fraction of the GO term in the analysed set 

B. sub- Root DM A DM B DM C DM D DM E 
tilts DB nodes 



Regulation of 
transcription, 
DNA-dependent 


oo ( 


113/121*58/62* 2/2 


Zj Z 


Zj Z 


z/z 


■5 jo 


Regulation of metabolism 


368 


114/121* 


58/62* 2/2 


2/2 


2/2 


2/2 


3/3 


Two-component signal 
transduction system 
(phosphorelay) 


99 


8/121* 


12/62* - 




2/2 






Transcription initiation 


26 


14* 


5/62** 2/2 


2/2 






1/3 


Negative regulation of 
cellular process 


15 


8/121* 


3/62 










Positive regulation of 
cellular process 


8 


6/121* 


6/62* 










Transcription termination 


6 


3/121 


2/62** - 










Sporulation 


258 






2/2 






3/3 


Phosphate transport 


10 








1/2 







plemented by a direct function examination of the genes 
belonging to DB in DBTBS (see SI for functional infor- 
mation) . 

Concerning to null model comparison, figures [7^, and 
b suggest a similar behaviour for DB and the set of root 
nodes in randomised networks than the observed for E. 
coli. 

SCC calculation revealed five dynamical modules as 
illustrated in figure [S] Four of them in the same 
54-connected component and the other one in the 5- 
component component. Similarly to E. coli, a single 
markedly large SCC is expected in randomised networks 
(see figure |7t). In the same way, the acyclic graph result- 
ing by SCCs condensation revealed a strong hierarchical 
character, similar to its E. coli bacterial counterpart. In 
this case, module A consisted of a cross-regulation of sigA 
and sigD sigma factors. These two nodes participates in 
the 26% of the total number of links of the GRN. This 
agrees with the essential role of these genes. SigA, also 
known as cr 43 or rpoD, is an essential gene, primary a fac- 
tor of this bacterium. It is also worth nothing that the 



equivalent rpoD gene occupies a similar position on the 
top of the E. coli condensed DB (module A in figured]). 
The module partner of this gene in B. subtilis, sigD or 
cr 28 , is a sigma factor required for flagellum and motility 
genes involved in chemotaxis process. This is different 
from E. coli, since its equivalent gene fliA (also cr 28 ) does 
not belongs to the E. coli module A but it has a direct 
downstream dependence of this module. 

As illustrated in figure [SJ modules C and D receive 
a direct input from the master module A. They appear 
related to signal transduction systems (resD and comk 
in module C and D, respectively). Interestingly, a cross 
dependence between respiration and phosphate assimi- 
lation is captured in module C. It has been described 
that resD can also play a regulatory role in respiration 
whereas phoD makes part of a molecular system involved 
in the regulation of alkaline phosphatase and phosphodi- 
esterase alkaline phosphate, two enzymes involved in the 
uptake of free phosphate groups from the environment. 
Moreover, a catabolite repression factor, ccpA, responsi- 
ble for the repression of carbohydrate utilisation and the 
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FIG. 6: Dynamical hierarchy and modularity of B. aubtilis GRN. Dynamical Backbone of B. subtilis after collapsing dynamical 
modules revealing a causal hierarchy (above). Different levels of downstream information processing are indicated by means of 
different vertex labelling. Dynamical modules larger than one vertex are represented by an individual vertex labelled by capital 
letters. Auto-links in single-node module are preserved in the graph for a more informative picture. Below B. subtilis DB, the 
five (A to E) dynamical modules larger than one node are shown. 



activation of excretion of excess carbon affects this mod- 
ule. This suggests a strong relation between phospho- 
rous and carbon assimilation with respiratory regulation. 
In other hand, the rok-comk genetic circuit described in 
module E appears involved in the expression of the late 
competence genes. 

In contrast to what occurs in E. coli, the most complex 
module is not at the top of the hierarchy. Module E is 
a three component gene (much smaller than for E. coll) 
closely related to sporulation process. It is noteworthy 
that this module receives inputs from the competence- 
related module E and another sporulation module (see 
table [TTJ) is found in an independent component. This 
is the case of module B which does not belong to the 
same DB connected component of the remaining mod- 
ules. It contains a sporulation-specific a factor (sigG, 
together with spovT controlling factor). Interestingly 
this module, significantly associated to sporulation func- 
tion (see Table HIT) . is affected by another sporulating a 
factor (sigF). Another relevant trait is the link between 
sporulation and response to nutritional stress performed 



by spoOA and the catabolite repression role of abrB in 
module D. 

From a topological point of view, the most obvious 
difference between E. coli and B. subtilis is the simpler 
composition of dynamical modules for bacillus. We must 
stress that the knowledge of these organisms -specially 
for B. subtilis, which is much less known that its E. 
coli counterpart- is in constant advance. Thus, a vari- 
ation in the wiring of these modules cannot be neglected. 
However, differences in module conformation might also 
be due to the large phylogenetic distance among these 
bacteria, as well as the extremely different environments 
where they live. E. coli is a gram-negative enterobacte- 
ria unable to sporulate that lives in the intestine of warm 
animals, whereas B. subtilis is a free living gram-positive 
organism, usually found in soils and with the ability to 
sporulate. However, in spite of these differences, a very 
similar architecture with a strong hierarchical character 
of its DB is shared. Once again, a single only-output 
hub is observed. As show below, this is markedly differ- 
ent from what is found in our eukaryotic example. 
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FIG. 7: Randomisation for B. subtilis GRN and its respective TF-TF subgraph (figure insets). Normalised counts represent 
the number of times that a value of the measured estimator {DB size, number of root nodes and SCC size) appears normalised 
by the number of repetition (1000 randomised networks) 




FIG. 8: Dynamical hierarchy and modularity of Saccha- 
romyces cerevisiae GRN. Dynamical Backbone of yeast after 
collapsing dynamical modules (above). One single dynamical 
module is represented by a vertex labelled by A letter. Be- 
low, the A dynamical module. As in the other diagrams, box 
legend shows the levels of computation. 



F. Yeast dynamical backbone 

GRN of Saccharomyces cerevisiae eukaryote was ob- 
tained from a compilation of different genome scale tran- 
scriptional analyses @] [see Methods for network con- 



struction]. The resulting network was a directed graph 
consisting of a single connected component with 4441 
vertices (29 of them with ki n = 0) and 12900 links, lead- 
ing to a (k) — 5.8. The network included a total of 157 
vertices with k out > and 4284 vertices with k out = 0, 
corresponding with the transcription factors and targets 
genes analysed in the different datasets compiled in [8j. 

The yeast DB subgraph was obtained after three it- 
erations of DB pruning. The DB was formed by a set 
of 109 vertices (2.45% of the GRN), being 17 of them 
isolated single root vertices, and a single connected com- 
ponent displaying 92 vertices and 318 edges, leading to 
(k) = 6.1. Analogously to bacteria GRNs, randomisation 
process produced networks with much larger DB with 
a smaller number of root nodes and a very large SCC 
compared to real yeast GRN (see figure H]). However, a 
different behaviour was observed from yeast TF-TF sub- 
graph producing a fairly bigger DB after randomisation 
process. This suggests that although bacteria and yeast 
DB exhibited a similar size, they would differ in their 
DB organisation. 

Functional analysis revealed that DB consists of el- 
ements responsible for regulation of transcription and 
metabolism as observed in bacteria. In addition cell cy- 
cle related functions and response to chemical stimulus 
appeared significantly over-represented in DB, the root 
node set and SCC (see table IIIII and SI2 for detailed sta- 
tistical analysis). 

All the 109 genes of the DB appear included in the 
list obtained from Q defining the TF genes of the yeast 
GRN. In addition, we checked that all DB elements, but 
five were clearly identified as TFs in the current version 
Saccharomyces genome database |56|. However, they all 
were classified as regulators of gene expression [Sec SI for 
biological details of DB genes] . 

The condensation process of yeast DB revealed a single 
dynamical module of 60 vertices (module A in figure [5£i) 
with a high average degree ({k) = 6.6) that contrasts 
to SCC distribution in bacteria. The module actually 
included more than a half of the DB, tied to a much less 
marked hierarchy (see figure [5Jd) than observed in the E. 
coli DB. However, the number of layers (seven) was very 
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FIG. 9: Randomisation for B. subtilis GRN and its respective 
TF-TF subgraph (figure insets) . Normalised counts represent 
the number of times that a value of the measured estima- 
tor (DB size, number of root nodes and SCC size) appears 
normalised by the number of repetition (1000 randomised net- 
works) 



close in both organisms, as figure^ indicates. 

We can see that the large dynamical module occupies 
a central position within the DB. Overall, the yeast DB 
resembles the so-called bow-tie organisation observed in 
the Internet [44[ and metabolism (5?j] where incoming 
inputs are integrated in a large component leading to a 
set of outgoing outputs. This substantially differs from 
the hierarchical character of E. coli DB and from the 
point of view of computation it reveals a much higher 
integration and pre-processing of the input-output paths. 

Interestingly, module A contains a marked over- 
abundance of cell cycle functional terms (P < 10E— 5, see 
table Mil and SI2 for detailed statistical analysis). Direct 
observation of database information indicated that TFs 
for cell cycle such as swi4, swi6 and swi5 (they control 
G\ related genes and they are also involved with DNA 
repair), hcml (S phase related genes), yhpl and yoxl 
(M/Gi phase) and fkhl-2 (G2 phase), among others. 
Moreover, the swi TFs located in module A make part of 



the 5 TFs contained in the minimal dynamical network 
suggested for yeast cell cycle [Til . Although this network 
combines protein modification besides transcription, we 
observe the all TFs in Li's network are included in the 
yeast DB. The two remaining TFs in Li's module that 
are not inside module A (mcml, mbpl ) are upstreamly 
located in the hierarchy, indicating a master control over 
the swi factors (and the remainder module A partners) 
at transcriptional level. It is also found a number of TF 
controlling the assimilation of carbon sources, amino acid 
assimilation (gal, adrl, aye7, mygl-2, putS, arg81, leuS, 
glnS and gcn4), nitrogen compound degradation (dal80- 
81, gzfS, gall), stress response (msn4,sutl,yapl-6, skol, 
smpl and msn2) and sporulation (rimlOl). A detailed 
description of biological functions derived form manual 
analysis of module A and yeast DB are detailed in SI. 



III. DISCUSSION 

The final goal behind any network-based analysis is in 
most cases to reach a systematic picture of what is rele- 
vant and what is not. According to this philosophy, we 
have presented a comparative study of the best known 
up-to-date GRNs by exploiting their most fundamen- 
tal trait: the principle of causality. The GRN can be 
seen as a roadmap of the computational processes link- 
ing external stimuli with adaptive cell responses and this 
can be reduced to the wiring pattern of causal depen- 
dencies among TFs and their target genes. Although 
we ignore here any dynamical features, stability proper- 
ties and other relevant components of GRN complexity, 
the construction process is unambiguous and leads to a 
unique, logical description of the causal core of a GRN. 

By using the directed nature of the network, we can 
systematically reduce network complexity from thou- 
sands of elements to a much smaller subset including 
all the causally relevant parts of the GRN. Such a 
causal/computational perspective is consistent with the 
nature of regulatory maps. No matter how they behave 
exactly (and thus how they would be modelled) the set of 
non-reducible causal links should capture most of what 
is dynamically relevant. The approach presented here 
combines the use of leaf-removal and SCC identification 
algorithms for directed networks. These methods do not 
make any assumption or require any probabilistic esti- 
mator providing a unique solution where the presence of 
a modular hierarchy can be easily depicted. 

This approximation to the identification of the basic 
computational/causal units differs from others such as 
network motif identification. Dissecting networks into 
motifs provides a systematic classification of GRNs in 
their small building blocks Q. By using the abundance 
of small subgraphs in a given network. However, the de- 
tection of a subgraph in a given net requires predefined 
decisions on what is to be measured. By contrast, SCC 
detection is not restricted to the search of a particular 
subgraph configuration and makes possible the identifi- 
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TABLE III: A selection of statistically over-represented biological functions terms from gene ontology annotation (GOA) is 
shown for DB, the root node set and dynamical modules (A to E), accordingly functional analysis for S. cerevisiae. Notice that 
only two genes of the DB were not found in GOA annotation (see SI2 for a detailed analysis) . The number of genes belonging 
to a specific GOA term is depicted in the first column. The fraction of genes associated to specific GOA respecting to the 
total number of genes for each particular set are shown for P > 0.05 (P is the level of significance). Highly significant results 
(P < WE — 5) are indicated with (**). Values within the interval of 10-E — 5 > P > WE — 3 are marked by (*). 



Biological Function # genes in GO Fraction of the GO term 

S. cere- Root nodes DM A 

visie DB 



Regulation of metabolism 


466 


80/107* 


14/27* 


51/60* 


Regulation of transcription, DNA-dependent 


327 


69/107* 


10/27** 


44/60* 


Positive regulation of cellular process 


101 


33/107* 


7/27* 


21/60* 


Negative regulation of cellular process 


191 


17/107* 




13/60* 


Response to chemical stimulus 


270 


20/107* 


5/27 


11/60** 


Cell cycle 


412 


21/107** 




17/60* 


Transcription initiation 


56 


5/107 






Gl/S-specific transcription in mitotic cell cycle 


12 


9/107* 




9/60* 



cation of entities that cannot be further partitioned due 
to the cyclic behaviour of this special set. As we stated 
in the introduction section, cycles are at the basis on a 
non-trivial computation 58] and, as we have shown, SCC 
is a topological concept that may be interpreted as a sort 
dynamical modules (identifiable semiautonomous entities 
[271 ]) within the framework of GNR. Attending to this, we 
have shown that these dynamical modules are organised 
within the core of computationally relevant dependen- 
cies that is the DB. Interestingly, this interpretation is 
also valid for any network depicting a relation of causal 
dependencies. 

The analysis of GRNs shows that dynamical backbones 
are very small (compared with the whole network) and 
thus the basic logic of GRNs can be strongly simplified. 
The very small number of iterations required to obtain 
the DB suggests a direct control of the TF set over the 
pool of target genes, without a long chain of command 
responsible for information transmission. This is rein- 
forced by the fact that in all cases more than the 90% 
of genes were eliminated after the first iteration of DB 
pruning (data not shown). Functional analyses did not 
reveal significant differences between DB and the set of 
root nodes. Since they are at the top of hierarchy, we 
could conclude that they might be related to stimulus 
sensing. However, we must stress that the TF-gene tar- 
get relation is a part of the regulation process and TF 
and promoter alteration through cell signalling processes 
or metabolism can operate in different elements of the 
regulatory set. In this context, further efforts in the in- 



tegration of different regulatory levels would contribute 
to clarify this issue. 

The strong hierarchical DB structure found for bac- 
teria is consistent with previous observations 

Sana. 

Our analysis reveals that bacterial GRNs are almost de- 
void of cycles, although not totally excluded, as previous 
studies have suggested [13]. Moreover, our results indi- 
cate that these cycles are organised into dynamical mod- 
ules with essential biological functions, such as module 
A in E. coli. The pattern of connections of this module 
reveals a non trivial computational link between tran- 
scription and metabolic control under both normal and 
stress growth conditions. As we have shown, it is a com- 
mon shared trait in bacteria despite the relative simple 
organisation of B. subtilis when compared with E. coli 
(most probably due to a lesser knowledge of its regula- 
tory wiring). In contrast with the prokaryotic DBs, the 
yeast DB is fairly different, with a high content of cy- 
cles organised into a single SCC, (see [46j). Our analysis 
shows that this large dynamical module occupies a cen- 
tral position in the yeast DB suggesting a bow-tie causal 
organisation. 

It is worth to note that, according to its definition, 
SCC components may be substantially modified by the 
addition of new links or by rewiring them. In this con- 
text, novel findings in the gene regulatory circuitry of 
these organisms may alter the current picture of these 
networks. Now an interesting question arises. From an 
evolutionary perspective, mutations in regulatory boxes 
of target genes, may alter the structure of SCCs. Then, it 
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is reasonable to think that the regulatory system should 
be protected against this kind of perturbations. Inter- 
estingly, when comparing real networks with their null 
model counterparts both DB and SCCs are markedly 
smaller than what is expected by chance. At this point, 
the reduction of the dynamically relevant core and the hi- 
erarchical organisation of the smallest possible size SCCs 
(attending to dynamical requisites of the cell) would con- 
tribute to reduce the impact of this type of mutations. 
In addition, another layer of complexity must be consid- 
ered to play a relevant role to minimise the impact of 
this type of mutations. This is the distribution of the 
activatory and inhibitory weight in the genetic wiring. 
Interestingly, it has seen that an experimental rewiring 
of the E. coli gene regulatory network does not cause a 
dramatic impact on organism surveillance (59j . 

Uncovering the genetic circuitry of these organisms is 
still an ongoing task, and our results can only provide a 
tentative approximation to the final picture. However, 
due to the good quality of data for E. coli network, only 
slight modifications in the feed-forward architecture are 
expected. In the case of DB yeast, its cyclic character 
is a distinctive trait. Notice that the lack of knowledge 
for gene connections would not alter the cyclic character 
of DB. In addition, only a tree-like structure (similar to 
bacteria) could be suggested if further evidences demon- 
strate that current data is enriched with a high number of 
false connections causing loops in the network. This is a 
technical limitation in yeast due to the fact that TF-gene 
connections are defined according to a statistic value [see 
yeast GRN construction in Methods]. In our case, very 
restrictive conditions were used for yeast network con- 
struction in order to reduce the impact of possible false 
connections. This provides strong arguments supporting 
that different strategies for GRN organisation are present 
in these prokaryotic and eukaryotic studied organisms. 

Attending to the biology of bacteria and yeast, com- 
mon patterns can be identified. Both share an unicellu- 
lar organisation, simple life cycles (only sporulation and 
replication states for yeast and bacillus) and similar nu- 
tritional requirements (all of them are chemoheterotrophs 
absorbing molecules from the medium). However, a more 
complex behaviour appears associated to the cell cycle 
control in yeast. As a consequence, these relatively sim- 
ple organisms may operate under comparable external 
inputs and a similar GRN organisation would be ex- 
pected. This is consistent with the different output be- 
haviour observed in yeast and bacteria. It is reasonable 
to think that this difference should originate by the dif- 
ferent wiring displayed in these GRNs. 

Yeast and bacteria drastically differ in their genomic 
organisation and cellular compartmentalisation, the two 
most distinctive features distinguishing eukaryotes from 
prokaryotes. It is widely accepted that spatial compart- 
mentalisation and intronic organisation confer a more di- 
verse regulatory behaviour (oOl . fsTI . l62l [o3j and genomic 
plasticity. GRN structures could be thus understand- 
able as alternative evolutionary solutions. More complex 



wiring in yeast, in contrast to bacteria, would be the basis 
for this differential computational behaviour. 



IV. METHODS 
A. Construction of GRNs 

1. E. coli network definition and construction 

GRN for E. coli is an overlapping of two files obtained 
from RegulonDB 6.0 [54| : NetWorkSet.txt, containing 
TFs and their target genes, and SigmaNetWorkSet.txt, 
containing the Sigma factors and the genes promoted by 
them. Both files contain information about the relations, 
as well as the activator/repressor behaviour, of TF (and 
Sigma factors) over the target genes. Biological informa- 
tion was obtained from EcoCyc database [64J. In this 
work we exclude elements contributing to a TF mod- 
ification such as phosphorylation or ligand-TF binding. 
Graph pictures were performed using Cytoscape software 
( |http: / / www.cytoscape.org/ ) . 

2. B. subtilis network definition and construction 

GRN for B. subtilis was obtained by the combina- 
tion of gene, tfac and siqma field information compiled 
in DBTBS (release 5) [55j . Biolo gica l information was 
obtained from DBTBS, SubtiList j65| and Uniprot [g6| 
databases. Graph pictures were performed using Cy- 
toscape software ( [http://www. cytoscape.org/). 

3. S. cerevisiae network definition and construction 

Yeast GRN was obtained from the compilation of dif- 
ferent sources performed by [67j. Self-interactions were 
not initially included in that work and they were di- 
rectly provided by the authors. Data corresponds with 
highly confident experiments (P — 0.001 and three posi- 
tive replicas). Biological information was obtained from 
Saccharomyces Genome database SGD [56| and Uniprot. 
Graph pictures were performed using Cytoscape software 
( |http: / / www.cytoscape.orgTI ■ 

B. Null model construction and analysis 

Null model networks were obtained by a randomisation 
process of the real GRN consisting of twice of the number 
of links rewiring events. A rewiring event is realised by 
an end node interchange in two randomly selected pairs. 
Then, the arrows of the new pairs are arbitrarily inverted. 
Once obtained the new connections, the algorithm ver- 
ifies that they were not previously in the network. If 
the rewired connection matches with existent links, the 
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solution is rejected and the algorithm proceeds to a se- 
lection of two new links. Number of autoloops is kept 
constant in this randomisation process. This randomi- 
sation permits to destroy local correlations keeping the 
number of nodes, links and degree distribution. A thou- 
sand of randomised networks were constructed for each of 
the three GRNs. In addition, we also obtained null mod- 
els for the set of nodes corresponding with TF activity, 
i.e, the nodes with k OMt > (1000 randomised networks 
for each of the three GRNs). DB size, SCC size com- 
ponents and number of root nodes were calculated per 
null model network. Frequencies were normalised by the 
number of replicas (1,000 for each of the six conditions, 
i.e, one GRN and the respective TF subset for each of 
three studied organisms). 



C. Statistical analysis of biological functions 



V. MATHEMATICAL APPENDIX 
A. Causality, dynamics and topology 

A directed graph (or digraph), Q = {Vg, Eg}, is consti- 
tuted by a set of vertices, Vg = \y\, ...vn}, and the set of 
edges linking them, Eg = {ei, ejf\. Formally, an edge 
from Vi to Vj is described by an ordered pair ej, = (vi,Vj), 
depicted by an arrow in the picture of the graph Vi — * Vj . 
An alternating sequence of vertices v\,...,v n G Vg and 
edges ei,^,e n -i G Eg, defines a directed walk in a di- 
graph Q [43| if there is a set e±, ...e„_i G Eg such that, 
for all i < n, ei — (vi,i>.;+i). We denote a directed walk 
(if it exists) from v\ to v n as 7r(t>i,i; n ). We can also de- 
fine a walk between v\ and v n , i.e., a sequence of vertices 
and edges connecting v\ and v n no matter the directed 
nature of the graph. 

The number of outgoing links of a given vertex Vi G Vg 
is known as out-degree (denoted by k ou t(vi)) whereas the 
number of incoming edges is the in-degree (ki n (vi)). Once 
ki n and k out have been defined, we can obtain the average 
connectivity or average degree, to be noted (k): 



Statistics for the estimation of the overabundance 
of biological functions were estimated for the DB set, 
root nodes and the dynamical module sets for the 
three different studied organisms. Hypergeometric 
test with Benjamini and Hochberg false discovery rate 
correction, and selected significance level of 0.05 was 
applied using the gene ontology biological functions. 
In the case of B subtilis, gene ontology annotation 
by parsing information containing B. subtilis taxon 
ID. The source file, gene_association.goajuniprot 
(submissi on date 4/ 25/ 2009 ), was obtained from 
|http: / / www.geneontology.org/ GO.current.annotations.shtml[ Attendm 
Specific E. coli GO annotation was directly obtained 
from the same website. The analyses were performed 
using Bingo 2.3 Cytoscape plugging 68:]. Notice that, 
gene annotation for yeast is included by default in this 
package. Detailed information of statistical results is 
provided in SI2 supplementary file. 
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(1) 



Furthermore, given a vertex Vi G Vg, it is interesting 
to define the set of vertices affecting it, to be noted Tf. 



Ti = {v k G Vg : (v k ,Vi) 



EEg}. 



(2) 



g to their reachability, a directed graph can 
display three types of components, namely: 



1. 



The strongly connected components (SCC), which 
are the subgraphs for which every two vertices are 
mutually reachable [H, |48j . 



In addition, biological functions of the DB were manu- 
ally collected from respective organism's databases (Eco- 
Cyc and RegulonDB for E.coli, DBTBS and Subtilist for 
B. subtilis and SDG for S. cerevisiae. 



The incoming components (IC), composed by the 
set of feed-forward pathways starting from vertices 
without in-degree and ending in SCC. 

The outgoing components (OC) where pathways 
starting from SCC end in vertices without out- 
degree (see figure IITA"|) . 



1. Software implementation 

Dynamical backbone algorithm and dynamical module 
detection were implemented in a software package using 
Perl language and unix/linux commands. This package is 
available for linux/unix platforms as supplementary ma- 
terial. Graph pictures were performed using Cytoscape 
software ( http : / / www.cytoscape.org/| ■ 



1. Threshold network model 

In this subsection, we briefly define the toy model used 
in the example provided in figure (|T|. 

In a dynamical setting, the state cr(t) = 
(di(i), crjv(i)) of a system a formed by N ele- 
ments would be updated under some class of dynamical 
process. An example of such dynamics is a threshold-like 
equation, in which the state of a vertex Vi G Vg at time 
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t + 1 is updated by the sate of the vertices v k £ L^, 
namely 

Gi{t + 1) = 6 ^Wy^.(t) - 6» t j 

where 0(x) = 1 if a; > and zero otherwise. Here 6% is 
a threshold and the weights £ { — 1,0, +1} define the 
type of interaction between genes. If the state of each 
element is Boolean, i. e. <7j € {0, 1} the previous model 
provides, for a given initial state, a closed description 
of the system's behaviour. Here the matrix W = (u>ij) 
captures the structure and nature of causal links. 

The small threshold network depicted in figure (|TJ 
starts from an initial state where 04(0) = 175(0) = 1 
and CTj(O) = 0; (i ^ 4,5) for other units, a cyclic at- 
tractor (of period 12) is obtained. Here only two ele- 
ments have a non-zero threshold, namely &\ = #4 = 1/2. 
Arrows and end circles indicate lu = +1 and u) = — 1 
links, respectively. The global state M(t) is given by 
M(t) = EjLi o-i(f) (thick lines, b). 



0, DB(Q). Let us assume that, when performing recur- 
sively the operation <I> over a directed graph, we reached 
the stable state, i.e., B n = B n -\. The Dynamical Back- 
bone is a subgraph of Q, DB C Q defined as: 

DB(J3) = DB(V DB(g) ,E Bn ) = DB{V Bn UR ni E Bn ) (5) 

Once DB(Q) is defined, we can also be interested in the 
fraction of the net that exclusively displays feed-forward 
structures, to be noted DB'(Q), which does not coin- 
cide exactly with the graph DB(Q), obtained from the 
set of vertices Vi £ Vg \ Vbb{g) an d the set of edges 
{e H = (vk,Vj) £ Eg : v k ,Vj £ Vg \ V DB (g)}. To prop- 
erly identify it, we need to define the subgraph L n as 
the set of connections linking DB(Q) to DB(Q) and the 
vertices linked to them. Note that this subgraph may dis- 
play many components. Its main feature is that the links 
end in vertices outside the DB(Q) but they come from 
vertices belonging to DB(Q). We obtain the maximal 
feed- forward subgraph DB'(Q) from 

DB'{Q) = DB(Q) U L n . (6) 



B. Dynamical backbone pruning 

Consider the pruning function $ : Q — > B, where B C 
Q. This function takes a directed graph as input, being its 
output the graph without all the vertices having k out = 
(and the links pointing to them). Accordingly, 

®{Q) = B 1 {V Bl ,V El }, (3) 

where 

( V Bl = {v k £ Vg : k out (v k ) > 0} 

\ E Bl = {{v k ,Vi) £ Eg : v k ,v l £ V Bl }. 

Thus, the computation is an iterative operation: 

HQ) - B u 
B 2 = 

B n = $(S„_l) 

The resulting graph at the i-th iteration is denoted by 
Bi = Bi(V Bi , E Bi ) and the computation ends when no 
further vertex elimination occurs, i.e. B n = B n _\. If, for 
some i^na vertex is eliminated and it has no connec- 
tions with a node belonging to B k ^i we let this vertex 
alive. At every iteration, this collection of single root ver- 
tices define a set and, from these sets, we build the set 
Ri of all the single root vertices found until the i-th step: 

Ri = IJ rk ( 4 ) 

We have now all the ingredients to provide a formal 
definition of the dynamical backbone of a directed graph 



C. Dynamical Modules and Hierarchy 

Let us assume that we are in the fc-th connected com- 
ponent of DB(Q). The i-th dynamical module of the 
fc-th connected component of the DB(Q), DM k , is a set 
of vertices (and the directed edges among them) that 
constitutes an irreducible unit of causal relations. As 
we said above, the existence of cycles inside the DB(Q) 
is responsible for the possible non trivial net dynamics. 
Thus, if the fc-th component of the DB(Q) is not a single 
root node, the concept of Dynamical Module is topolog- 
ically equivalent to the SCC. As there can be a more 
than one dynamical module in a directed graph, we can 
define A k (G) as the set of dynamical modules of the fc- 
th component of the DB(Q). Interestingly, when the 
dynamical module is contracted to a single vertex, the 
resulting graph is a directed, acyclic graph (figure [3J:). 
Such an operation is commonly referred in the literature 
as the condensation of a graph [43| . Accordingly, we can 
construct a new condensed graph, 

H k (g)=H k (V Hk ,E Hk ), (7) 

where V Bk = Ak{G) and E Bk is the set of links connect- 
ing the different dynamical modules. In other words, we 
collapse the elements of every dynamical module into a 
single node and we let the links connecting different mod- 
ules of the component of the DB(Q) we are working in. 
Notice that, as a consequence, we cannot state that 

H k (g) ^ g. (8) 

Interestingly, when we consider these M's as single ver- 
tices of H k (Q), we obtain a feed-forward graph. It is pre- 
cisely the feed-forward organisation what enables us to 
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define an order relation among the elements of DB(Q). 
Such an order relation can be interpreted as the dynam- 
ical hierarchy of the network's dynamical core and it is 
defined among the different modules of DBk . If we define 
n(iffc(<?)) = {-Kx, .., 7r£j} as the set of all existing directed 
walks over all nodes of Hk(G), we can define the order 
relation " >" as: 

{DM* > DMj) 4-» (3w(DM?, DM*) e U(H k )) (9) 

Such an order relation is called the Transitive closure of 
the graph H^{Q) (43| . The above order relation provides 
our definition of causal hierarchy. 

VI. SUPPLEMENTARY FILES 

Supplemental information (SI). An excel file containing 
the biological function of the DB genes for the studied 



organisms obtaining for a manual curation process. 

Supplementary information 2 (SI2). An excel file con- 
taining the statistical functional analysis for DB, the set 
of root nodes and dynamical modules for the three stud- 
ied organisms. GRNs and respective DBs are provided 
as supplementary material in Cytoscape format. 

Software package for DB and DM identification is pro- 
vided as supplementary material (SI3). 

Supplemetary files are provided upon request. 
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